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Abstract 

Background: Lactobacillus salivarius strains are increasingly being exploited for their probiotic properties in humans 
and animals. Dissemination of antibiotic resistance genes among species with food or probiotic-association is 
undesirable and is often mediated by plasmids or integrative and conjugative elements. L. salivarius strains typically 
have multireplicon genomes including circular megaplasmids that encode strain-specific traits for intestinal survival 
and probiotic activity. Linear plasmids are less common in lactobacilli and show a very limited distribution in 
L. salivarius. Here we present experimental evidence that supports an unusually complex multireplicon genome 
structure in the porcine isolate L salivarius JCM1046. 

Results: JCM1046 harbours a 1.83 Mb chromosome, and four plasmids which constitute 20% of the genome. In 
addition to the known 219 kb repA-type megaplasmid pMP1046A, we identified and experimentally validated the 
topology of three additional replicons, the circular pMP1046B (129 kb), a linear plasmid pLMP1046 (101 kb) and 
pCTN1046 (33 kb) harbouring a conjugative transposon. pMP1046B harbours both plasmid-associated replication 
genes and paralogues of chromosomally encoded housekeeping and information-processing related genes, thus 
qualifying it as a putative chromid. pLMP1046 shares limited sequence homology or gene synteny with other 
L. salivarius plasmids, and its putative replication-associated protein is homologous to the RepA/E proteins found in 
the large circular megaplasmids of L salivarius. Plasmid pCTN1046 harbours a single copy of an integrated conjugative 
transposon (Tn6224) which appears to be functionally intact and includes the tetracycline resistance gene tetM. 

Conclusion: Experimental validation of sequence assemblies and plasmid topology resolved the complex genome 
architecture of L salivarius JCM1046. A high-coverage draft genome sequence would not have elucidated the genome 
complexity in this strain. Given the expanding use of L salivarius as a probiotic, it is important to determine the 
genotypic and phenotypic organization of L. salivarius strains. The identification of Tn6224-like elements in this species 
has implications for strain selection for probiotic applications. 
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Background 

Lactobacillus salivarius [1] is a member of the indigen- 
ous microbiota of the oral cavity and the gastrointestinal 
tract (GIT) of both humans and animals [2,3], and has 
also been isolated from human breast milk [4]. The pro- 
biotic and immunomodulatory activity of L. salivarius 
strains has been recently reviewed [5] and are consid- 
ered to be strain-specific traits [6]. Strains of L. salivar- 
ius are genetically diverse [7] and harbour distinctive 
multireplicon genomes. The first genome of this species 
to be published [8,9] was that of the well-characterised 
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strain L. salivarius UCC118 [1,10-13] whose megaplas- 
mid pMP118 (242 kb) encodes genes involved in GI 
tract survival, fitness and probiotic activity [9-11]. L. sal- 
ivarius strains from a range of environmental sources 
harbour diverse circular megaplasmids [7,12]. At least 
10 additional L. salivarius genomes have been sequenced 
since that of strain UCC118; three of these have been 
completed (strains CECT 5713 [14] NIAS840 [15] and 
SMXD51 [16]) with two being finished to a draft quality 
status [17,18]. 

Unlike circular plasmids, linear plasmids are rarely ob- 
served in lactobacilli [12] but often confer advantageous 
phenotypes to their hosts [19,20] and have been exten- 
sively studied in Streptomyces [21,22], Borrelia [23] and 
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Bacillus [24]. Linear phage genomes are also harboured 
by strains of Escherichia coli [25], Yersinia enterocolitica 
[26], Klebsiella oxytoca [27] as well as the probiotic 
cheese strain Lactobacillus paracasei NFBC 338 [28]. 
Prior to the discovery of linear megaplasmids in L. sali- 
varius [12], a 150 kb linear plasmid was identified in 
Lactobacillus gasseri CNRZ222 [29]; but no characte- 
rization of the plasmid was performed. We previously 
identified linear megaplasmids in two porcine L. salivar- 
ius isolates, JCM1046 and JCM1047, and one human in- 
testinal isolate AH43348 [12]. 

The conjugative transposon (CTs) Tn916 (18.5 kb) 
[30] and other Tn916-like elements are highly promiscu- 
ous [31], both in the lab and in natural environments 
[32]. They have demonstrated intra- and interspecies 
transfer from Lactococcus lactis [33] and Lactobacillus 
paracasei [34] food strains; and between streptococcal 
species in dental biofilms [35]. There is a growing con- 
cern that commensal bacteria may act as natural reser- 
voirs for antibiotic resistance determinants [36] and may 
be responsible for transfer of antibiotic resistance to path- 
ogens and opportunistic pathogens [37] . In addition to the 
introduction of additional functional modules to the host 
cell, CTs have further potential to influence natural selec- 
tion within a bacterial population [38]. There is therefore 
a growing need to characterize these mobile elements, 
particularly in species used in food or as probiotics. 

Here we present experimental evidence for a highly un- 
usual genome architecture in L. salivarius JCM1046, a 
strain that harbours multiple extrachromosomal replicons 
of varying sizes and topologies and which has an en- 
hanced ability to withstand the stresses associated with 
GIT survival [11]. The present study describes an unpre- 
cedented level of genome complexity in L. salivarius. 

Results and discussion 

Discovery of circular and linear extrachromosomal 
elements in L. salivarius JCM1046 

Sequencing revealed that L. salivarius JCM1046 contains 
five replicons (Table 1): a 1.836 Mb chromosome, two 



Table 1 General genome features of L salivarius JCM1046 



Feature 


Chromosome 


pMP1046A 


Replicon size (bp) 


1,836,297 


219,748 


GC Content (%) 


33.1 


32.04 


Topology 


Circular 


Circular 


% of genome size 


79.1 


9.4 


Coding genes 


1705 


214 


Coding density (%) 


83.3% 


80.7% 


rRNA operons 


/ 


0 


tRNAs 


IS 


0 


Pseudogenes 


60 


15 



circular megaplasmids of 219 and 129 kb, a linear mega- 
plasmid of 101 kb, and a 33 kb plasmid harbouring an 
integrated conjugative transposon (Figure 1). The com- 
plexity of this genome configuration presented extraor- 
dinary challenges for genome assembly, described below. 
Experimental validation of the genome structure is pre- 
sented in Figure 2. L. salivarius strains JCM1047 and 
AH43348 were known to harbour linear megaplasmids 
that were presumed to be related to pLMP1046 [12] and 
were therefore included in these experiments. 

Our original study that identified pMP1046A (then 
designated pMP1046 [12]) in strain JCM1046 estimated 
its size as 230 kb, based on Pulsed Field Gel Electropho- 
resis (PFGE) [12]. However, the assembled sequence data 
revealed pMP1046A as closer to 220 kb in size. A combin- 
ation of restriction digestion, PFGE and Southern hybrid- 
isation was used to validate the size of pMP1046A. Apal 
was used to linearise the replicon prior to PFGE and 
Southern Blot analysis. Probes associated with the repli- 
cation origin of pMP1046A hybridised to a band that 
migrated to a constant position between the 194 kb and 
242.5 kb linear A. DNA markers, which was in keeping 
with the expected 219,748 bp size indicated by DNA 
sequencing. 

We identified two novel plasmids pMP1046B and 
pCTN1046 from the genome sequence. A large contig 
(~130 kb) was assembled that could not be experimentally 
determined to form part of either the chromosome or pre- 
viously described plasmid content of strain JCM1046 [12]. 
This contig harboured plasmid-associated replication and 
maintenance proteins. A PCR product off the ends of this 
contig was generated and subsequently sequenced (data 
not shown) which proved that the assembled contig was 
circular in the cell, and it was designated pMP1046B. 
Under the PFGE conditions that are routinely used to visu- 
alise the plasmid content of L. salivarius strains, pM1046B 
had previously gone undetected [9,40] possibly because it 
was masked by the linear replicon pLMP1046 [12]. 

We employed restriction digestion and SI nuclease 
treatment in conjunction with PFGE and Southern Blot 



pMP1046B 


pl_MP1046 


pCTN1046 


129,218 


101,883 


33,315 


33.87 


30.91 


34.89 


Circular 


Linear 


Circular 


5.5 


4.3 


1.4 


159 


112 


■-10 


83.6% 


82.6% 


76% 


0 


0 


0 


2 


0 


0 


2 


0 
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Figure 1 Genome atlas of the plasmids of L salivarius JCM1046. A graphical representation of each plasmid in the L salivarius JCM1046 
genome was generated using DNAPLOTTER [39]. Genes on the forward and reverse strands (green); pseudogenes (grey blocks); GC% (black 
above mean and grey below mean); GC skew (mustard above mean and purple below mean) are illustrated for each replicon. Genes encoded by 
the plasmid backbone of pCTN1046 are also green, the genes present on the integrated conjugative transposon Tn6224 are represented as 
follows: conjugative transfer (pink), accessory genes (turquoise), transcriptional regulation (dark blue) and recombination (yellow). 



analysis to confirm the sizes and topologies of the plas- 
mids present in JCM1046. Figure 2 panels A and B illus- 
trate the identification of a repB-type megaplasmid in 
JCM1046, panels C and D display the linear plasmids of 
JCM1046, JCM1047 and AH43348, and panels E and F 
illustrate the size and topolgy of pCTN1046. Chromo- 
somal DNA bands of strains JCM1046, JCM1047 and 
AH43328 migrate to the equivalent of the 1 Mb marker 
(Figure 2 panels A, C and E). SI Nuclease preferentially 
nicks and linearises megaplasmids due to inherent tor- 
sional stresses [41]. The linearised form of the repA-type 
circular megaplasmids of the L. salivarius strains are in- 
dicated by the open black arrows in Figure 2 panels A, C 
and E. 

When an increased band intensity or band width is 
observed in a PFGE gel, it is often indicative of the pres- 
ence of linear DNA, high copy number extrachromo- 
somal elements or co-migrating bands of similarly sized 
DNA fragments [42]. Strain JCM1046 gDNA revealed 
high-intensity bands in the Sl-treated sample at a position 
just below the 145.5 kb lambda DNA marker. This band 
represents the overlapping linear forms of pMP1046B 
and pLMP1046. In the untreated sample of JCM1046, 



the circular form of pMP1046B is retained in the well; 
therefore the repB gene probe binds only to the well but 
not to the migrating linear plasmid pLMP1046 (Figure 2 
panel B). However, in the Sl-nuclease treated gDNA 
sample of JCM1046, the repB probe hybridised strongly to 
the overlapping pLMP1046/pMP1046B bands (Figure 2 
panel B), thereby confirming that the discrete replicons 
pLMP1046 and pMP1046B appear as one overlapping 
120 kb band in their linear forms (Figure 2 panel B). 
The repB probe did not hybridise to the lanes contain- 
ing JCM1047 or AH43348 gDNA, indicating that these 
strains lack a second repB-type circular megaplasmid 
(Figure 2 panel B). The presence of a second circular 
megaplasmid has also been reported in strains NIAS840 
and SMXD51, both of these strains being of animal 
origin [15,16]. 

Both Sl-treated and untreated gDNA samples of 
JCM1046, JCM1047 and AH43348 show the presence 
of linear plasmids: pLMP1046 (140 kb), pLMP1047 
(140 kb) and pLMP43348 (175 kb) respectively (Figure 2, 
panels A and C). Each of the linear plasmids hybridised 
to a gene probe derived from the pLMP1046 sequence 
(Figure 2D). 
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Figure 2 Confirmation of the genome architecture of L salivarius JCM1046. (A, C and D) PFGE gels of enzyme-treated gDNA of strains 
JCM1046, JCM1047 and AH43348. Corresponding Southern Hybridizations using replicon-specific probes are shown directly below each gel (B, D, 
and F). The probes used for the Southern Hybridizations targeted the following genes: the repB gene of pMP1046B (B), an endonuclease gene in 
pLMP1046 (D) and a region spanning the int-xis genes of pCTN1046 (F). None of the probes employed showed cross hybridisation with non- 
target replicons. S1 nuclease (+), Smal (t), Sphl (.), Pstl (40 were used individually or in combination to determine the plasmid profiles of each 
strain. Untreated samples of gDNA are denoted by (-). Closed-black arrowheads indicate A DNA concatamers used as size standards (H) (A-F). 
Chromosomal DNA bands of each strain are seen migrating to the equivalent of the 1 Mb marker (A, C and E). Open-black arrows indicate the 
SI nuclease-linearised repA megaplasmids in each strain examined (A, C and E). A rep8-type megaplasmid was found to be present in strain 
JCM1046 but absent from strains JCM1047 and AH43348 (A and B). Both Si-treated and untreated gDNA samples of JCM1046, JCM1047 and 
AH43348 show the presence of linear plasmids of 140 kb, 140 kb and 1 75 kb respectively (C), each of which hybridise to a pLMP1046-derived 
probe (D). S1-nuclease, Sphl and Pstl were independently used to linearise pCTN1046 (33 kb) (E). A probe based on the int and xis genes of 
pCTN1046 binds to the linear form of pCTN1046 (F). pCTN1046 does not have a Smal site and is retained in the well in its circular form in the 
Smal-digested sample. 



A conjugative transposon in L. salivarius JCM1046 

We further identified a 33 kb plasmid in strain JCM1046 
that was not previously observed in the plasmid profile 
of strain JCM1046 [12,40] and that was identified here 
by de novo scaffold assembly and designated pCTN1046. 
It harbours a Tn916-like element and was experimen- 
tally determined to have a circular topology. In silico 
analysis was first used to identify restriction enzymes 
whose use would resolve the chromosomal DNA of 
JCM1046 from that of pCTN1046. Sphl and Pstl each 
cut the chromosome multiple times, while linearising 
pCTN1046. Following treatment, pCTN1046 is visible as 
a band which migrates to a position between the 23.1 kb 
and 48.5 kb, in keeping with the assembled 33 kb size of 
pCTN1046 (Figure 2E). The chromosome of JCM1046 
has multiple Smal restriction sites, while pCTN1046 has 
none. The multiple DNA bands in the Srafl/-treated 
gDNA sample (Figure 2E) are chromosomal fragments, 
while the uncut circular form of pCTN1046 was re- 
tained in the well. A probe spanning the int and xis 



genes of pCTN1046 hybridised strongly to the 33 kb 
bands in the Sl-nuclease, Sphl and Pstl treated samples 
of JCM1046 (Figure 2F). Similarly, the same probe 
hybridised to the circular form of pCTN1046 retained 
in the well of the Smal-treated sample, but did not hy- 
bridise to the migrating chromosomal bands (Figure 2F). 
The same pattern of hybridisation was obtained when 
the experiment was repeated with a probe based on the 
tetM gene harboured by pCTN1046 (data not shown). 
Although Tn916-like elements have been shown to insert 
at a single site in some species, in almost all bacterial hosts 
they insert at multiple sites [43]. Our data indicate that 
the conjugative transposon in strain JCM1046 is inte- 
grated at a single site in pCTN1046 and is absent from 
the rest of the genome. 

General genome features of L. salivarius JCM1046 

The unusual genome complexity of JCM1046 raised 
questions about gene distribution by replicon. Bioinfor- 
matic analysis identified 1,705 coding sequences in the 
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chromosome, a coding density of 83.3% (Table 1). Bio- 
logical functions could not be assigned to 360 of these 
protein coding sequences. The chromosome of L. sali- 
varius JCM1046 contains 60 pseudogenes (Additional 
file 1). Seven rRNA operons were identified on the chro- 
mosome, as well as 76 tRNA genes for all 20 amino acids. 
The chromosome has an average GC content of 33.1%, 
with three regions displaying atypical GC content relative 
to the rest of the genome (see below). 

The largest of the plasmids pMP1046A has a coding 
density of 80.7%. 214 coding sequences were identified, 
79 of which were for hypothetical proteins. pMP1046A 
contains 15 pseudogenes (Additional file 1). The gene 
content of pMP1046A will be discussed in detail below. 

We identified 159 coding regions in pMP1046B, 
though biological function could only be assigned to 
29.7%, the vast majority (110/158) of genes remaining 
cryptic. The GC% content of pMP1046B (33.87%) corre- 
lates well with the 33.1% GC content of the JCM1046 
chromosome (Table 1) suggesting long-term adaptation 
to the host cell, or acquisition from a bacterium with a 
similar % GC content. In addition to harbouring plasmid- 
associated replication machinery, pMP1046B harbours 
additional housekeeping and information-related genes, 
thus fulfilling the criteria for extrachromosomal elements 
known as chromids [44]. pMP1046B encodes two tRNA 
genes, tRNA (Gin) (LSJ_3064) and tRNA (Ser) (LSJ_3066) 
but these genes are not uniquely present on pMP1046B 
i.e. they are paralogs of chromosomally encoded genes. 
Gene duplication can offer a level of genomic redun- 
dancy to a strain that is adapting to a new environment 
[45], and the tRNA genes encoded by pMP1046B may 
enable JCM1046 to respond more rapidly to changing 
environmental conditions. pLMP1046 harbours 112 cod- 
ing sequences, none of which were pseudogenes. However, 
85 of the predicted coding sequence products were an- 
notated as hypothetical proteins, some of which may rep- 
resent remnants of functional genes. The average GC 
content of pLMP1046 (30.9%) is significandy lower than 
that of the JCM1046 chromosome (33.1%), implying these 
replicons experienced distinct evolutionary histories and 
that pLMP1046 may be a recent acquisition. 

PFGE analysis predicted the size of pLMP1046 to be 
approximately 130 kb (this study), but sequencing re- 
vealed a replicon that was 102 kb. It is reasonable to as- 
sume that this discrepancy and the lack of identifiable 
terminal inverted repeats (TIR) (discussed below) is an 
assembly artifact due to omission of the presumptive re- 
peat sequences in the terminal regions of pLMP1046. 
The problems faced in the sequencing of the telomeres 
of linear elements are well recognised [46]. 

In keeping with the guidelines outlined by Roberts et al. 
[47] the novel conjugative transposon contained within 
pCTN1046 was designated Tn6224. In silico analysis 



predicted a coding density of 76% for pCTN1046. Thirty- 
nine coding sequences were identified (Table 1), the 
majority of which (21/39) are linked to the integrated 
transposon. The sole pseudogene harboured by this repli- 
con lies outside the Tn6224 region and shows similarity to 
nitroreductase family proteins. The plasmid backbone of 
pCTN1046 has an average GC content of 30.8%, whereas 
Tn6224 has an average GC content of 38.6%. Unsurpris- 
ingly, this suggests that Tn6224 was most likely acquired 
via horizontal gene transfer (HGT). Insertion of Tn916- 
like elements is not random, with the insertion sites differ- 
ing from species to species [38], but generally displaying a 
distinct preference for target sites which are A-T rich and 
that have a limited homology with the ends of the element 
[43] . As only one copy of Tn6224 was found in the genome 
of JCM1046, a putative consensus of the target sequence in 
L. salivarius could not be determined. Accounting for the 
potential presence of coupling sequences, the 35 bp that 
flanked either end of Tn6224 was examined to determine 
if the target sites in L. salivarius are in keeping with those 
generally described for these elements [38]. The AT con- 
tent of the sequences upstream and downstream of 
Tn6224 were found to be 97.1% and 85.7% respectively, in- 
dicating that the target site for Tn6224 is likely to be simi- 
lar to those of other species [38]. 

Phage, transposases and CRISPR regions 

PHAST [48] identified two regions of bacteriophage-re- 
lated DNA in the genome of JCM1046, both found on the 
chromosome of JCM1046. In addition to a 22.6 kb remnant 
prophage that spans residues 1378015-1400296 bp, an in- 
tact 28,541 kb prophage was also identified on the chromo- 
some which spans residues 1439831-1444300 bp. At 
43.7%, the remnant prophage is one of the three regions of 
atypical GC content. 

102 transposases (including 22 pseudogenes), repre- 
senting eight IS families were found distributed across 
four of the five replicons of strain JCM1046. The distri- 
bution of transposases is detailed in Additional file 2. 

Clusters of regularly interspersed short palindromic re- 
peats (CRISPRs) and CRISPR-associated genes (cas genes) 
provide the host with acquired and heritable resistance 
against genetic transformation, phage and plasmid prolif- 
eration [49]. One CRISPR associated system (cas) was 
identified on the chromosome of JCM1046 at position 
810173-812140 bp, consisting of a 1059 bp repeat locus 
composed of a 36 bp direct repeat and 26 spacers. This 
CRISPR region is immediately upstream of the gene en- 
coding Cas2 and immediately downstream of eight add- 
itional CRISPR-associated protein coding genes. 

Replication of extrachromosomal elements 

The replication region of pMP1046A extends from LSJ_ 
2000 to LSJ_2006 (6449 bp). The gene content and 
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organisation of the replication region of pMP1046A is 
highly similar to (98% nt identity (ID)) that of pMP118 
[9] and to those of other sequenced L. salivarius strains 
(Figure 3). pMP1046A is likely to replicate by theta-form 
replication [50]. 

The predicted replication region of pMP1046B spans 
residues 128175-1974 bp of the plasmid. This region in- 
cludes a repA gene (LSJ_3160) at the position of a switch 
in GC skew that is characteristic of replication origins 
[52]. LSJ_3160 shares 36-56% aa ID with L. salivarius 
RepA protein sequences. The RepA protein of pMP1046B 
also displays 40% aa ID to the RepA protein of the pig iso- 
late Lactobacillus reuteri ATCC 53608 [53]. The second 
gene in the pMP1046B ori region, LSJ_3000 encodes a 
predicted partitioning/copy control protein, RepB. 

Analysis of pLMP1046 indicates that it shares limited 
sequence homology or gene synteny with linear replicons 
of other species. However, given the lack of sequenced 



counterparts in other lactobacilli, the absence of homolo- 
gous genes in databases is unsurprising. Replication is 
commonly initiated from one or more internal ori sites in 
linear plasmids and proceeds bidirectionally towards the 
telomeres [54-56]. Our previous study indicated that the 
linear plasmids of L. salivarius did not harbour the repA 
and repE genes encoded by the circular repA-type mega- 
plasmids of L. salivarius [12], and thus it was presumed 
that pLMP1046 utilised an alternate mode of replication 
to the circular plasmids of L. salivarius [12]. Sequence 
analysis identified two plasmid-associated replication 
genes encoded by pLMP1046, LSJ_4017 (nt 25084-26103) 
and LSJ_4096 (nt 89781-91007). LSJ_4017 exhibits 39- 
41% aa ID with proteins annotated as either RepE or RepA 
in the circular megaplasmids of L. salivarius. This level of 
sequence homology was not high enough to cause cross 
hybridisation between the replication genes of pMP118 
and the repA/E gene identified in pLMP1046, thus 




Figure 3 A comparison eight repA-type megaplasmids of L. salivarius. A BLAST atlas diagram of eight repA-type megaplasmids of L salivarius 
was generated using BLAST Ring Image Generator (BRIG) [51], using pMP1046A as the reference replicon (the outer dark green ring). Working 
inwards from pMP1046A, the next seven rings represent query repA-type plasmids of L salivarius strains: cp400, pMP20555, pMP1 18, pHN3, 
pMPGJ-24, pNA2, pLS51 A. When the completed or circularised version of the repA-type megapiasmid was not available (L salivarius cp400 [18] 
and L. salivarius DSM20555), all available sequence data for each strain was mapped to p(V1P1046A. Regions of diversity between the repA-type 
megaplasmids are indicated by the labels R1-R9. The GC% of pMP1046A was projected onto the mapped plasmid sequences (black ring) and sits 
outside the molecular clock surrounding the figure legend at the centre of the figure. 
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accounting for the observations of our previous study 
[12]. LSJ_4096 encodes a putative RepB-like replication 
initiator protein. The replication origins of Streptomyces 
linear plasmids are comprised of helicase-like rep genes 
and interons [22], while the replication ori of N15 is 
located within the repA gene, which acts as a multifunc- 
tional protein combining primase, helicase and origin- 
binding activities [57]. RepA boxes were not identified in 
the proximity of either the repA or repB genes of pLMP1046; 
however, the genomic region immediately upstream of the 
rep A coincides with a switch in GC skew. This suggests 
that the repA gene lies within the putative ori region of 
pLMP1046. 

The mechanism that pLMP1046 uses to prevent the 
progressive shortening of their telomeres after each cycle 
of replication is unknown. It is possible it employs a cir- 
cular mode, as in some Streptomyces linear plasmids 
[58], but it is more plausible that the sequence of 
pLMP1046 is missing sections of its terminal regions 
due to a sequencing or assembly artefact. Further ana- 
lysis of the terminal regions of pLMP1046 will be re- 
quired to fully elucidate the mechanism involved in the 
replication of L. salivarius linear plasmids. 

There are two replication associated genes harboured 
by the plasmid backbone of pCTN1046 which are sepa- 
rated by approximately 6 kb. LST_5030c shares 52% aa 
ID with a replication-associated protein in Lactobacillus 
amylovorus GRL 1112. LSJ_5035c encodes the plasmid 
associated replication protein, RepB, the gene for which 
coincides with the position of a switch in GC skew, and 
is therefore the presumed to be the replication origin of 
pCTN1046. LSJ_5035c shares 36% aa ID with the RepB 
protein of L. lactis subsp. cremoris TIFN1 and 100% aa 
ID to a replication initiation protein in the 30.6 kb plas- 
mid pLS51C in L. salivarius SMXD51. 

Plasmid maintenance 

Several of the JCM1046 plasmids encode genes impli- 
cated in plasmid incompatibility. Three of the plasmids 
(pMP1046B, pLMP1046 and pCTN1046) encode a repB- 
like gene, two (pMP1046A and pMP1046B) encode repE- 
like genes and two (pMP1046A and pLMP1046) encode 
repA-hke genes. However the presumptive replication re- 
gions of the co-resident plasmids display low levels of se- 
quence ID with the highest nt ID shared between the repB 
genes of pLMP1046 and pCTN1046 at 58.7%. The mosaic 
nature of the replication regions as well as the lack of 
nucleotide homology between the respective replication 
associated genes of the co-resident plasmids is a plaus- 
ible explanation for the compatibility of the plasmids 
that co-exist in strain JCM1046. Several complete Toxin- 
Antitoxin (TA) systems were identified on plasmids 
pMP1046A and pLMP1046 and likely play a role in the 



stability and maintenance of the co-resident plasmids in 
JCM1046. 

Comparative L. salivarius genomics and relationship to 

phenotype 

Chromosome 

In contrast to the human probiotic strains L. salivarius 
UCC118 and L. salivarius CECT 5713 which share 
98.5% nt pairwise ID between their chromosomes and 
98.6% nt pairwise ID between their re/>A-type megaplas- 
mids, the genome structure, and sequence of JCM1046 
diverges significantly from the other published L. sali- 
varius strains. 

The chromosome of JCM1046 shares 68.4% nt pair- 
wise ID with strain UCC118 and includes 55 regions 
(min 800 bp) [59], representing 16.5% of the chromo- 
some, that are absent from strain UCC118 (Additional 
file 3). Indeed, a comparison of the chromosome of strain 
JCM1046 to that of the other published L. salivarius gen- 
ome sequences revealed 48 chromosomally encoded genes 
in JCM1046 that were absent in the other published L. 
salivarius genomes (Additional file 4). These genes pri- 
marily belong to categories of genes that have been shown 
to be hypervariable among L. salivarius strains [7] and 
other Lactobacillus species [60] and include transposases, 
phage-associated genes, and genes involved in carbohy- 
drate metabolism and host interaction (Additional file 4). 
The GC% map of the JCM1046 chromosome identifies 
three regions with significantly deviating GC content, one 
of which is the remnant prophage that is resident on the 
chromosome. The smallest of these regions stretches from 
residues 782,449 to 793,883 bp. This 11.4 kb region has a 
GC% content of 43.6% and encodes a protein containing a 
mucin-binding MucBP domain (LSJ_0784), several trans- 
posases, hypothetical proteins and a choloylglycine hydro- 
lase (BSH2, LSJ_0788). Although present in the porcine 
strains JCM1046 and cp400, this region is absent from 
other sequenced genomes of L. salivarius and may repre- 
sent a niche specific adaptation. 

BSH2 is one of two choloylglycine hydrolase genes 
encoded by the genome of JCM1046 [11]; the second 
(BSH1, LSJ_2111) is present on pMP1046A and is wide- 
spread among L. salivarius strains [11]. In contrast, 
BSH2 has only been identified in three isolates to date, 
JCM1046, LMG14476 and cp400, all of which are of ani- 
mal origins. BSH2 confers JCM1046 with an ability to 
resist much higher concentrations of the major human 
conjugated bile acids when compared to strains that 
harbour BSH1 alone [11]. In addition, BSH2 has recently 
been shown to reduce weight gain and serum LDL chol- 
esterol and liver triglycerides in mice fed normal or 
high-fat diets [61]. 

We have previously shown that exopolysaccharide 
(EPS) production levels and the presence of associated 
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genes vary widely in L. salivarius [7]. JCM1046 harbours 
a single EPS gene cluster that spans 33 kb, containing 33 
genes, including two pseudogenes (Additional file 5). 
The EPS locus exhibits an atypical GC content relative to 
the rest of the chromosome, 29.7% and 33.1% respectively. 

pMP1046A 

Nine substantial regions of sequence diversity, ranging in 
size from 3.8-22.6 kb were identified between pMP1046A 
and the sequences of the other published repA-type mega- 
plasmids (Figure 3; Table 2). Hypothetical proteins and 
transposases are abundant within these regions (Table 2). 
Indeed, region two and region four primarily harbour 
hypothetical proteins, while region six harbours only IS el- 
ements (Table 2, R2, R4 and R6). Regions three and eight 
mostly encode solute transporters (Table 2 R3 and R8). 

The largest region of diversity among the strains ex- 
amined is 22.6 kb (Figure 3, Rl) and harbours several 
genes predicted to work synergistically with chromo- 
somally encoded pathways to broaden the metabolic 
capabilities of strain JCM1046. Although present in strain 
cp400, this region is highly divergent in all other examined 
plasmids (Figure 3) and primarily encodes proteins in- 
volved in aa metabolism. JCM1046 is prototrophic for L- 
proline due to the presence of a chromosomally-encoded 
pathway. Three paralogous genes (LSJ_2016, LSJ_2020 
and LSJ_2021) in this region are responsible for the inter- 
conversion of L-proline to D-proline. Also present in this 
region are two genes (LSJ_2031, selD and LSJ_2028, selA) 
which work in conjunction with the chromosomally en- 
code gene (LSJ_0220, serS) to synthesise L-selenocysteine. 
These increased biosynthetic capabilities are likely to en- 
hance the ability of JCM1046 to thrive in the competitive 
porcine GIT. 

The genes present in regions five and nine (Table 2) 
are primarily involved in the metabolism and transport 
of carbohydrates, and vary from strain to strain (Figure 3, 
R5 and R9). Similarly to pMP118, pMP1046A harbours 
both single copy and paralogous genes that complete a 
number of the carbohydrate fermentative pathways that 
are partially encoded by the chromosome of JCM1046A. 
These include the pentose phosphate and gluconeogene- 
sis pathways as well as the fermentation pathways for 
sorbitol and rhamnose. 

Bacteriocin production is a putative probiotic trait of 
L. salivarius strains (see review [62]). The genetic or- 
ganisation of the 7.9 kb bacteriocin locus in pMP1046A 
is analogous to that of the Abpll8 locus in the human 
isolate UCC118 (Figure 3 R7). The structural genes 
(LSJ_2170 and LST_2169) of the bacteriocin locus of 
pMP1046A, are identical to the genes (Slnl and Sln2) 
which are responsible for the production of the two- 
component antilisterial bacteriocin Salivaricin P. This 
bacteriocin differs in sequence to Abpll8 by two amino 



acids [63] and is produced by several other porcine iso- 
lates of L. salivarius [63,64]. However, a frame-shift in 
the abpT gene (LSJ_2163) of JCM1046 is likely respon- 
sible for the bacteriocin negative phenotype observed in 
this strain [12]. 

pCTN1046 

The conjugative element Tn6224 harboured by plasmid 
pCTN1046 shares 96.2% nt sequence ID with the conju- 
gative element Tn916 and lacks only two genes which 
encode hypothetical proteins in the conjugative region of 
Tn916. When comparing pCTN1046 to other sequenced 
L. salivarius genomes, pCTN1046 shares 64.6% nt ID with 
the 30.4 kb plasmid pLS51C harboured by the probiotic 
avian isolate SMDX51 [16]. This plasmid shares sequence 
homology with both the plasmid backbone and conjuga- 
tive element of pCTN1046 (Figure 4). Tn6224 appears to 
be functionally intact, containing the: conjugative, recom- 
bination, transcriptional regulation and accessory genes 
(Additional file 6) associated with Tn916. In contrast the 
integrated conjugative element that is resident in pSL51C 
appears to be a remnant of a conjugative element as it 
lacks the recombination genes xis (LSJ_5019) and int 
(LSJ_5020). pLS51C harbours a limited number of the 
conjugative genes present in Tn6224 and Tn916 but lacks 
the ardA gene present in pCTN1046 which has been re- 
cently shown to aid the transfer of mobile genetic ele- 
ments (MGEs) between unrelated bacterial species [65]. A 
putative TnGBSl-like element (TnLsall.l) was identified 
in L. salivarius strain DSM20555. However, our analysis 
suggests that the contig predicted to harbour TnLsall.l 
[66] forms part of the putative pMP20555 megaplasmid in 
the type-strain L. salivarius DSM20555. The weak hom- 
ology between the proteins identified in TnLsall.l and 
those identified in other TnGBSl-like elements [66] may 
be due to their similar functional roles in their respective 
replicons. 

L. salivarius strains are increasingly being examined 
for their probiotic properties in both humans and ani- 
mals [5]. Dissemination of antibiotic resistance genes via 
the food chain to either the resident microbiota of the 
human gut or pathogenic bacteria is likely to have far 
reaching effects on both human and animal health and 
present a major financial cost [67]. Thus, the identifica- 
tion of conjugative transposons carrying antibiotic resist- 
ance genes in the genomes of two animal isolates of L. 
salivarius may have repercussions for strain selection in 
future probiotic studies. 

pMP1046B and pLMPW46 

Plasmids pMP1046B and pLMP1046 share neither se- 
quence homology nor gene synteny with the additional 
L. salivarius plasmids sequenced to date. Both of these 
replicons require further functional characterisation to 
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Table 2 Regions of sequence diversity in pMP1046A 



Region of diversity: base 
coordinates {size bp) 


Gene 


Start position 


End position 


Gene product 


R1: 14543..37152 {22609) 


LSJ. 


_2012 


14616 


15683 


Hypothetical membrane protein 




LSJ. 


.2013 


15827 


16882 


Hypothetical membrane protein 




LSJ. 


.2014 


17072 


17242 


Conserved hypothetical protein 




LSJ, 


.2015 


17257 


18330 


Putative thiosulfate sulfurtransferase 




LSJ, 


.2016 


18782 


20686 


D-proline reductase, prdA 




LSJ, 


.2017 


20688 


21005 


Conserved hypothetical protein 




LSJ. 


.2018 (P) 


20995 


21720 


Proline reductase, probable pseudogene 




LSJ. 


.2019 


21740 


22480 


Conserved hypothetical protein 




LSJ. 


.2020 


22501 


22974 


D-proline reductase 




LSJ. 


.2021 


22999 


24018 


Proline racemase 




LSJ. 


.2022 


24031 


24909 


Hypothetical membrane protein 




LSJ. 


.2023 


25005 


26603 


Amino acid permease 




LSJ. 


.2024 


26684 


27322 


Conserved hypothetical protein 




LSJ. 


.2025 


27434 


27691 


Conserved hypothetical protein 




LSJ. 


.2026 


27691 


29583 


Selenocysteine-specific elongation factor 




LSJ. 


.2027 


29573 


30712 


Cysteine desulfurase 




LSJ. 


.2028 


30713 


32113 


L-seryl-tRNA selenium transferase, selA 




LSJ. 


.2029 


32119 


32448 


Conserved hypothetical protein 




LSJ. 


.2030 


32560 


33870 


NADH dehydrogenase 




LSJ. 


.2031 


33963 


34937 


Selenophosphate synthase, selD 




LSJ. 


.2032 


35037 


36209 


Hypothetical membrane protein 




LSJ. 


.2033 


36228 


36473 


Conserved hypothetical protein 




LSJ. 


.2034 


36698 


37129 


Conserved hypothetical protein 


R2: 52540..64667 {12127) 


LSJ. 


.2049 


52523 


54520 


Conserved hypothetical protein 




LSJ. 


.2050 


54507 


55370 


Conserved hypothetical protein 




LSJ. 


.2051 


55360 


56145 


Conserved hypothetical protein 




LSJ. 


.2052 


56204 


58906 


DNA methylase 




LSJ. 


.2053 


58913 


60871 


DEAD/DEAH box helicase family protein 




LSJ. 


.2054 


60864 


62066 


Conserved hypothetical protein 




LSJ. 


.2055 


62137 


64662 


Conserved hypothetical protein 


R3: 88322..98017 {9695) 


LSJ. 


.2078 (P) 


88463 


88739 


Transposase, probable pseudogene 




LSJ. 


.2079 


88814 


89641 


Transposase ISLasa15, IS3 family 




LSJ. 


.2080 


90080 


91279 


MFS Transport protein 




LSJ. 


.2081 


91593 


92813 


MFS Transport protein 




LSJ. 


.2082 


92865 


93761 


Transcriptional regulators, LysR family 




LSJ. 


.2083 


93920 


94756 


Conserved hypothetical protein 




LSJ. 


.2084 


94785 


95612 


2-deoxy-D-gluconate 3-dehydrogenase 




LSJ. 


.2085 


95631 


97346 


Fumarate reductase flavoprotein subunit 




LSJ. 


.2086 


97367 


98242 


Shikimate 5-dehydrogenase 


R4: 100291..121050 {20759) 


LSJ. 


.2089 


100815 


101621 


Conserved hypothetical protein 




LSJ. 


.2090 


101614 


101832 


Hypothetical protein 




LSJ. 


.2091 


102071 


102190 


Hypothetical protein 




LSJ. 


.2092 


102310 


103761 


Plasmid replication protein-primase 



Raftis et al. BMC Genomics 201 4, 1 5:771 Page 1 0 of 1 5 

http://www.biomedcentral.com/1471 -21 64/1 5/771 



Table 2 Regions of sequence diversity in pMP1046A (Continued) 



R5: 147401 ..153337 (5936J 



R6: 160003..1 64289 {4286) 



R7: 167503..1 82637 (75734) 



i <; i 




1 03865 


1 ViAAAf, 


Fly [JU LllcLILdl IllclllUldllc pi U Lcl 1 I 


I C I 


_zuy4 


I U44oo 


1 U4oZU 


Hypothetical membrane protein 


L3J_ 




1 f)A& 1 R 

I U40 I O 


1 0^73 
I Ujj / 3 


Hypothetical protein 


LSJ 


2096 


1 05746 


1 nfii qr 

i uo i yo 


ny pu LllcLILdl icLlcLcU pi U Lei 1 I 


I c; i 

L3J_ 


70Q7 


i ucoyu 


1 n^R^ 

I UOojj 


Hypothetical protein 


I c; i 

L3J_ 


7f1QR 
_Z<JyO 


i uoy/ j 


i uv yco 


Conserved hypothetical protein 


I c; i 

L3J_ 


_zuyy 


1 nPzHQ 

I UO^jy 


I UoOOj 


Hypothetical secreted protein, possible signa peptide 


I c; i 

L 3J_ 


7 1 nn 

_Z I uu 


1 OGAZLl 
1 UyH^f 1 


1 1 con 
I I uz/ / 


Hypothetical protein 


I c; i 

L 3J_ 


71 ni 

_Z 1 U 1 


1 1 07R7 
I I UZo/ 


1 1 1 n^7 

I I IUD/ 


Putative DNA-entry nuc ease 


LSJ 


21 02 


1 1 1 064 


1 1 1 543 


v_(JI I it: I VcU I iy pu LllcLILdl piULtrlll 


I C I 


_Z I U J 


111 £7A 
I I I j/O 


111 7/1 3 
1 1 1/43 


Hypothetical secreted protein 


LSJ 


21 04 


1 1 1 993 


1 1 21 03 


ny pu LllcLILdl piULclll 


LSJ 


21 05 


1 1 21 60 


1 1 2756 


v_UI I be I VcU I iy pu LllcLILdl piULclll 


I c: i 

L3J_ 


_Z I UD 


1 1 77AQ 
I I Z/ 


1 1 ^OQzl 


Conserved hypothetical protein 


I_3J_ 


7 1 H7 

_Z I u/ 


1 1 ^Q1 A 

i i jy 1 4 


1 1 RRQ7 

i i ooyz 


Hypothetical protein 


l_3J_ 


7 1 HQ 
_Z I Uo 


1 1 orn 1 
i i yco i 


1 IDA 7R 
I ZU4Zo 


Conserved hypothetical protein 


I C | 
l_3J_ 


3 1 no 

_z i uy 


I zuooy 


131 f17Q 
1 Z 1 U/o 


Hypothetical protein 


l_3J_ 


_Z I 3 DC 


"XAIAW 
I 4/4U I 


1 /I RHR 1 
I 4oUo I 


Fructose-6-phosphate aldolase 


L3J_ 


71 37r 
_Z I 3 / L 


1 ZIP 1 dd.fi 
I 4o I 40 


1 HODZO 


PTS system, cjlucito /sorbitol-specific IIA component 


LSJ 


21 38c 


1 48565 




r I J byiLclll, L) I ULI LUI/ bUI Ul LUI ipcLI I IL IIDL, LUI 1 1 pUI Icl 1 L 


I c: i 

L3J_ 


71 3Gr 
_Z I j^L 


1 AQ^ri7 

1 4-you/ 


1 3U i ^y 


PTS system, cjlucito /sorbitol-specific IIC2 component 


I_3J_ 


7 1 /IfV 
_Z 1 4UL 


1 c;m A1 
I jU I D I 


1 ^n^R 
I jUOjo 


Sorbitol operon activator 


I c; i 

l_jJ_ 


71 ZL1 r 
_Z 1 H 1 C 


i juojy 


1 ^7m r 

I 3Z3 I O 


Sorbitol operon transcription recjulator 


LSJ 


21 42c 


1 52534 


1 53337 


3UI Ul LUI O pi lUipi Id Lc Z Ucl lyUIULJcl Idic 


LSJ 


21 50 


1 f^nn^R 

I DUUJO 


1 60575 


I I d I lipUidbc l3Ldbd Id, I3IZZ3 Idlllliy 


L3J_ 


71^1 
_Z I J I 


I DUDU/ 


1 fil ZL73 
1 O 1 *t/ J 


IS1223 family transposase 


LSJ 


21 52 


1 61 537 


1 62544 


1 1 d 1 lipUbdic 1 1 dLJ 1 1 Icl 1 L 


i q i 


7 1 

_Z I J 3 


I DZDy4 


1 AMOR'S 
I O3yo3 


ISL3 family transposase 


LSJ 


21 55 


1 6671 6 


1 67537 


1 1 1 LcLj I die 


LSJ. 


_2156 


167573 


167839 


Hypothetical protein 


LSJ. 


_2157 


168343 


169011 


Hypothetical protein 


LSJ. 


_2158 


169087 


169419 


Hypothetical protein 


LSJ. 


_2159 


169424 


170053 


Conserved hypothetical protein 


LSJ. 


_2160 


1 70398 


170802 


Toxin antitoxin system, toxin component 


LSJ. 


_21 61 


1 70802 


171023 


Toxin antitoxin system, antitoxin component 


LSJ. 


_2162 


171466 


172614 


AbpD bacteriocin export accessory protein 


LSJ. 


_2163 (P) 


1 1 72630 


174788 


AbpT bacteriocin export accessory protein, probable 
pseudogene due to frameshift 


LSJ. 


_2164 


1 75441 


175680 


Hypothetical membrane spanning protein 


LSJ. 


_2165 


175717 


176511 


AbpR response regulator 


LSJ. 


_2166 


1 76525 


177817 


AbpK sensory Transduction Histidine Kinase 


LSJ. 


_2167 


177819 


177938 


AbpIP induction peptide 


LSJ. 


_2168 (P) 


1 78086 


178232 


AbplM bacteriocin immunity protein 


LSJ. 


_2169 


178371 


178577 


Abp1 18 bacteriocin beta peptide 


LSJ. 


_21 70 


178595 


178789 


Abp1 18 bacteriocin alpha peptide 



Raftis et al. BMC Genomics 2014, 15:771 
http://www.biomedcentral.com/1471 -21 64/1 5/771 



Page 11 of 15 



Table 2 Regions of sequence diversity in pMP1046A (Continued) 



R8: 189782.. 193560 {3778) 



R9: 204232..215364 (7 7 732) 



LSJ 


21 71 


1 78795 


1 79052 


DdL Lfcrl IULI I 1 1 1 Kxi pi fcr|Jfc:|JUIJt: 


i q i 

l_JU_ 


1 1 71 

_/ 1 / / 


1 7Q1 Ql 

i / y i oz 


1 7cn c; c; 
i /yjjj 


Nonfunctiona salvaricin B precursor 


I C I 


_Z I / J 


1 70£QQ 

I /yjoo 


1 70Qf^ 1 

i /yoj i 


Hypothetical membrane spanning protein 


I Q i 

l_JU_ 


_Z 1 /4 


1 7GRQl") 

i /yoyu 


1 ami q 
i ouz i y 


Hypothetical protein 


I C I 


_21 75 


1 OO/I A 1 

I oU44 I 


1 Q1 A 1 C 
I O 14 I D 


Hypothetical membrane associated protein 


I C I 


_z I /O trj 


1 1 Q 1 C7Q 
I I O I D/O 




HAD-superfamily hydro ase, probable pseudogene due to frameshift 


I c; i 


11 77 

jL I / / 


1 O/jJO 


i ozo/ y 


nypoii leucai pioieii i 


I C I 


_21 87 


i oyyuy 


1 m yl 1 /I 

i y 14 14 


Sodium solute symporter 


I C I 


_Z I OO 


1 01 /I 3 ") 

i y i 4dz 


I yZDDt) 


Na^+j/n^+j antiporter 


Lju_ 


T 1 QQ 

_z i oy 


i yzoz j 


1 cn ^ Ryi 
l yjjjt 


Xy ose isomerase domain protein 


I C I 


_zzU 1 


zU4zyj 


zUozoo 


Transketolase 


LSJ. 


_2202 


206304 


206957 


Transaldolase 


LSJ. 


^2203 


207466 


208521 


L-iditol 2-dehydrogenase 


LSJ. 


_2204 


208562 


20961 1 


Alcohol dehydrogenase 


LSJ. 


_2205 


209625 


210896 


Galacitol PTS, EIIC 


LSJ. 


_2206 


210923 


211219 


Galactitol PTS, EIIB 


LSJ. 


.2207 


211254 


211706 


Galacitol PTS, EIIA 


LSJ. 


_2208 


211888 


212688 


DeoR family transcriptional regulator 


LSJ. 


_2209 


212812 


215178 


Xylulose-5-phosphate/fructose-6-phosphate phosphoketolase 



Genes associated with the regions of diversity (R1-R9) in pMP1046A, as illustrated in Figure 3. Genes present on the reverse strand are denoted by the suffix c 
following the locus tag (LSJ_XXX). Pseudogenes are denoted by (P). Numbers in italics represents the size of the region in bp. 



determine whether or not they have an impact on the 
phenotype and ecological properties of JCM1046. 

Conclusion 

The porcine strain JCM1046 harbours the most structur- 
ally complex multipartite genome identified in L. salivar- 
ius to date. Through complete sequencing and assembly 
of the genome of JCM1046 we identified two additional 
replicons that were not previously known to form part 



of the plasmid complement of this strain, and that would 
probably not have been identified by the high-coverage 
draft genome sequencing commonly applied. We deter- 
mined that one of these replicons, pMP1046B is a candi- 
date chromid, though much of its gene function remains 
cryptic. The plasmids of L. salivarius probably confer on 
their host many of the genes associated with niche adap- 
tation and which are known to modulate the phenotype 
of a strain significantly. JCM1046 was found to harbour 



Transcriptional regulation 



Conjugative transfer 



Accessory 

. gene(s) Recombination 




PLS51C 

(30,6 Kb; 

Figure 4 Sequence alignment of Tn916, pCTN1046 and pLS51C. A linear comparison of the BLASTN matches between the extrachromosoma 
replicons pCTN1046 and pLS51C (harboured by L salivarius strain SMXD51 [16]) and the conjugative transposon Tn916. Vertical grey-coloured blocks 
between sequences indicate regions of shared nt ID. The gradient of the grey colour corresponds to the percentage of shared nt ID (dark grey 
(100%)-light grey (75%)). The genes in each element are coloured according to their function in the conjugative transposon Tn916: pink (conjugative 
transfer), turquoise (accessory genes and transcriptional regulation), dark blue (transcriptional regulation) and yellow (recombination). Genes encoded 
by the plasmid backbone of pCTN1046 are green, and those associated with the backbone of pLS51C are dark purple. 
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both plasmid-encoded (pMP1046A) and chromosomally 
encoded genes associated with adaptation to the GIT en- 
vironment. The putative replication ori of pLMP1046 
was identified and the sequence of this linear plasmid 
will provide a genetic platform for the study of linear 
DNA replication in Lactobacillus sp. An integrated con- 
jugative transposon (Tn6224), carrying tetracycline re- 
sistance was identified in plasmid pCTN1046, the first 
described in a sequenced L. salivarius genome. It will be 
interesting to see how prevalent Tn6224-like elements 
are within the L. salivarius population, as more genome 
sequences become available. 

Methods 

Bacterial strains and culture conditions 

L. salivarius strains were routinely cultured at 37°C 
under micro-aerophilic conditions (5% C0 2 ) in de Man- 
Rogosa-Sharpe (MRS) medium (Oxoid Ltd, Basingstoke, 
Hampshire, UK). 

PFGE plug preparations 

Agarose gel plugs of high molecular weight DNA for 
PFGE were prepared according to a published protocol 
[12]. 

S1 -nuclease treatment 

Single slices (2 mm x 2 mm) were treated with Aspergil- 
lus oryzae SI nuclease (Roche, Mannheim, Germany) ac- 
cording to a published protocol [12]. 

Restriction of PFGE plugs 

Single slices (2 mm x 2 mm) were washed three times for 
15 min in 1 ml 10 mM Tris.Cl, 0.1 mM EDTA (pH 8.0) at 
room temperature. Each slice was pre-incubated with 
250 ul of restriction buffer recommended for the enzyme 
for 30 min at 4°C and then replaced with 250 ul of fresh 
buffer containing 20 units of restriction enzyme. Restric- 
tion digests were carried out overnight at temperatures 
recommended by the supplier. 

Pulsed field gel electrophoresis 

Treated (Sl-nuclease/restriction enzyme) and untreated 
plugs of genomic DNA were examined under conditions 
employed in a previously published protocol [12]. Gels 
were stained in distilled water containing 0.5 ug/ml eth- 
idium bromide for 60 min in light-limited conditions 
and destained in water for 30 min. 

Probe preparation and Southern hybridization 

Probe preparations and Southern blot hybridizations 
were carried out according to a published protocol [12]. 
The primers used to generate PCR amplicons that were 
used as probes are listed in Additional file 7. 



Genome sequencing 

L. salivarius genomic DNA (gDNA) isolation was per- 
formed as described previously [1]. The genome of 
JCM1046 genome was sequenced using a combination 
of shotgun sequencing by the Sanger method (4-fold 
coverage), pyrosequencing (24-fold coverage) and Illumina 
(204-fold coverage). A large-insert (-40 kb) fosmid library 
was constructed in the CopyControP' pCCFOS™ vector 
system (Epicentre Technologies, USA). Corporation, USA) 
Insert ends (-800 bp/read) were sequenced generating 
mate pairs and 7.5 Mb sequencing data. Pyrosequenc- 
ing generated approximately 217,000 unpaired reads 
(-250 nt); from a half plate on a 454 FLX instrument 
(Agencourt Biosciences, Beverly, MA). In addition to 
the shotgun and 454 data for the JCM1046 genome, an 
additional half lane of Illumina sequencing (23 Mb total 
sequence data) was obtained which consisted of a 3 kb 
mate-pair library and a 400 bp paired-end library (Fastens, 
Geneva, Switzerland). Each Illumina library provided an 
average of 204-fold coverage. Illumina reads were assem- 
bled (default settings) into contigs using Velvet v 0.7 [68], 
which were then used to generate 300 bp pseudocontigs. 
A de novo genome assembly of the shotgun, 454 and 
Illumina (pseudocontigs) sequence data was performed 
using the Roche/454 Life Sciences Newbler (Gs) assembler 
v 2.3 [69], producing an initial assembly of 102 contigs 
(>500 bp) distributed over 32 scaffolds for the genome of 
JCM1046. The resulting 454 assembly was then used as 
a reference for the mapping of raw Illumina data. This 
mapping assembly was performed using Mira [70] and 
undertaken to extend contigs, close gaps and for error cor- 
rection of the draft genome. Gap closure was achieved 
using a PCR-based strategy. Primers were designed at the 
end of contigs and Dreamtaq DNA polymerase (Fermentas, 
Ontario, Canada) was used to amplify products corres- 
ponding to contig-contig gaps. Scaffolds were ordered and 
oriented by PCR using primers were designed at the ends 
of the scaffolds and the inter-scaffold region was amplified 
using Extensor long PCR enzyme mix (Abgene, Epsom, 
UK). PCR products for both the sequencing gaps and the 
inter-scaffold gaps were sequenced by Eurofins MWG 
Operon (Ebersberg, Germany) and the sequences were 
integrated into the assembly using PHRAP [71]. Correct 
placement of the gap sequences was confirmed by ob- 
servation using Tablet, a next generation sequencing 
graphical viewer [72]. 

Genome annotation 

Annotation was carried out according to a published 
protocol [73] with minor modifications. Specifically, ini- 
tial annotation was transferred from the related strain L. 
salivarius UCC118 [74] and then manually curated in 
Artemis [75]. PHAST [48] was used to identify prophage 
regions within the genome sequence. 
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Data availability 

The annotated genome sequence has been deposited in 
GenBank under accession numbers CP007646 (chromo- 
some), CP007647 (pMP1046A), CP007648 (pMP1046B), 
CP007649 (pLMP1046), CP007650 (pCTN1046). 

Genome comparisons 

Nucleotide alignments were generated using a local 
BLAST v 2.2.22 installation which were then visualized 
and analyzed for gene conservation and sequence syn- 
teny using the Artemis Comparison Tool (ACT) [76]. 

Identification of novel genetic regions 

The Novel Region Finder module of Pan seq v 2.0 [59] 
was used to identify novel genomic regions in strain 
JCM1046, compared to other L. salivarius genome se- 
quences. A minimum novel region size of 800 bp was 
chosen and default Nucmer values were used. 

Additional files 
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